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ABSTRACT. When comparing the composition and 
structure of communities using traditional indices, the 
problem arises of an adequate evaluation of the results 
related to a lack of statistical criteria for this evalua- 
tion. To resolve this problem, a new method for com- 
munity comparisons is advanced, based on the use of 
an empirically obtained similarity standard. Soil spring- 
tail communities, sea phytoplankton and sea macrob- 
enthos communities serve as model objects. The wide- 
ly used Jaccard's similarity index and Shorygin's coef- 
ficient (the sum of the minimum relative abundances of 
species in the samples to be compared) are chosen as 
examples. Empirical distributions of these indices for 
samples taken both in ecologically remote and similar 
communities are studied. Significance levels for arriv- 
ing at a decision concerning the degree of similarity in 
their species compositions and species structures are 
determined. An express method for creating a similari- 
ty standard of species structure is developed, based on 
regular observations in particular ecological conditions. 
Using springtail populations, we show how to select a 
standard dataset to apply this index when comparing 
communities from various ecosystems and when ana- 
lyzing seasonal and between-year changes in commu- 
nities within a single habitat. The use of a similarity 
standard renders cluster analyses or dendrogram con- 
structions redundant, thus avoiding a diversity of data 
interpretations. 


РЕЗЮМЕ. При сравнении сообществ по cocra- 
ву и структуре с помощью традиционных индексов 
встает проблема адекватной оценки результатов, 
связанная с отсутствием статистических критериев 


этой оценки. Для ее решения предложен новый ме- 
тод сравнения сообществ, основанный на исполь- 
зовании эмпирически полученного эталона сход- 
ства. В качестве объекта рассмотрены сообщества 
почвенных коллембол, фитопланктона и макробен- 
тоса. Для примера выбраны широко используемые 
индексы Жаккара и Шорыгина (сумма минималь- 
ных относительных обилий видов в сравниваемых 
выборках). Были изучены эмпирические распреде- 
ления этих индексов для проб, взятых как из эколо- 
гически различных, так и сходных сообществ. Оп- 
ределены уровни значимости для принятия реше- 
ния о сходстве их видового состава и видовой струк- 
туры. Разработан ускоренный метод создания эта- 
лона сходства видовой структуры по данным регу- 
лярных наблюдений в конкретных экологических 
условиях. На примере населения коллембол пока- 
зано, как подобрать эталонную совокупность для 
использования этого индекса при сравнении сооб- 
ществ различных экосистем, анализе сезонных и 
межгодовых изменений населения в пределах од- 
ного биотопа. Использование эталона сходства по- 
зволяет обойтись без кластер-анализа и построения 
дендрограмм, порождающих разнообразие вариан- 
тов интерпретации данных. 


Introduction 


Comparisons between species compositions in sam- 
ples taken from diverse habitats and/or in different 
seasons subjected to various external impacts are among 
the approaches most frequently used in the study of 
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communities. The species composition and/or the com- 
munity structure are thereby compared through apply- 
ing this or that similarity index. Then, if the number of 
communities exceeds two, the results are usually pre- 
sented in the form of a dendrogram. The choice of a 
way of clustering and of a suitable similarity measure 
is determined by the study's objectives and the pecu- 
liarities of underlying material. Literature devoted to 
these problems is highly diverse. Their detailed ac- 
count can be found in a still relevant monograph by 
Pesenko [1982]. The same problems have construc- 
tively been discussed in another, more recent publica- 
tion [Shitikov et al., 2005]. 

The diverse methods of measuring the similarity or 
dissimilarity of various species lists in samples as de- 
scribed in the literature seem to be related to the very 
notion of similarity or dissimilarity, which seems to be 
quite clear for an intuitive grasp, but resists a univocal 
definition in attempts of proposing for it a mathemati- 
cally substantiated measure. So the question arises if it 
makes sense at all to discuss the advantages of one 
measure over another, based on the way of its calcula- 
tion, when we are only vaguely aware of what exactly 
we are to measure? 

From a practical viewpoint, however, the following 
question arising when we compare species lists from 
two samples is more important: can the differences 
found in the lists be considered as evidence of the 
samples having been taken in different communities, in 
different seasons, in habitats differing in the rate of 
pollution etc.? Or are these differences actually related 
to sampling inaccuracies alone? Or are they due to 
errors stemming from the abundance estimates (num- 
bers calculations) of each of the species? When analyz- 
ing experimental data, this problem is usually formu- 
lated in terms of mathematical statistics, also posing 
the following question: at which significance level a 0- 
hypothesis of the absence of differences between sam- 
ples can be discarded, based on the available sample 
values of their characteristics? 

In the literature, such problems related to classifi- 
cation methods are only seldom discussed, likely be- 
cause the majority of these methods are inherently non- 
statistical, i.e. 0-hypotheses are formulated neither in 
theoretical design nor in a programme realization of 
the respective algorithms while the characters of the 
objects to classify, be they measured on a relative or 
absolute scale, are regarded as determined values. The 
pattern of their distribution as random values is simply 
ignored. 

The present paper advances the notion of a similar- 
ity standard, compared to which any total of samples 
taken in the course of an ecological study would con- 
tain some samples similar in species composition and 
structure to an independently created standard dataset. 

Certainly, because there is no clear definition of 
what is similarity, it is hardly possible to propose a 
standard of this similarity which would apply to all 
situations. Instead, to cope with such a particular task 
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as a comparison of species compositions of communi- 
ties, a similarity standard has long been advanced in 
the form of “any total of parallel samples” (Maximov, 
1984). A practical realization of this approach is pre- 
sented below, particular situations taken as examples. 


Material and methods 


Results of an analysis of the species compositions 
of soil Collembola, phytoplankton and sea macrob- 
enthos served as input material. Along with the appar- 
ent differences between the objects, they shared about 
the same statistical reliability of abundance estimates 
in the study organisms. In all cases the overall numbers 
of individuals calculated in each separate sample ranged 
from several dozen to 100—300 ex. while the average 
abundance of a species in this sample determined as 
the geometric mean of the species in the sample failed 
to exceed 5—6. The geometric mean was chosen be- 
cause the numbers distribution per species revealed in 
the samples is known to be similar to exponential. In 
any event, the rank distribution of abundance loga- 
rithms very often looks like a simple linear regression. 
As it is easy to comprehend, the arithmetic mean of log 
values is the logarithm of their geometric mean. 

The species composition of soil-dwelling spring- 
tails was determined using samples taken in 1983 with 
square frames of 5 x 5 cm on plots 30 x 30 cm in size in 
a southern taiga lichen-moss pine forest on Silon Island 
in the Darwin State Biosphere Nature Reserve. Each 
plot supported 36 samples containing taken from the 
following microsites: lichens — 1,612 ex., 26 spring- 
tail species; lichen with a spot of green moss — 2,761 
ex., 26 species; a diffused mixture of lichen and green 
moss — 1,641 ex., 20 species; green moss with a spot 
of lichen — 2,290 ex., 21 species; and green moss — 
6,646 ex., 21 species. The numbers of per species per 
sample on these plots averaged 3.1, 5.8, 3.3, 4.7 and 
6.8, respectively. Material was extracted with Tullgren 
funnels and then fixed using standard techniques [Po- 
tapov & Kuznetsova, 2011]. 

In addition to the above springtail series used in 
Chapter “Analysis of the similarity matrix” in Potapov 
& Kuznetsova [2011], data were also incorporated con- 
cerning the springtail numbers obtained through sum- 
ming up the samples taken in "lines" between two 
trees, five samples in each “line”, in further five differ- 
ent forest ecosystems. These were bilberry spruce for- 
ests and oak woodlands in the Mordovian Nature Re- 
serve and in the environs of the city of Vilnius, as well 
as data for a bilberry spruce forest in the south of 
Arkhangelsk Region revealed during two sequential 
years of sampling [Kuznetsova, 2005]. 

Phytoplankton sampling in the White and Kara seas 
was performed by staff members of the Chair of Gener- 
al Ecology and Hydrobiology of the Moscow State 
University more than 30 years ago. In both cases, 50 
samples, each 1 litre in volume from the surface hori- 
zon, were taken from an anchored boat. The samples 


A similarity standard and its use 


08 


06 


Jaccard's index 


04 


02 


159 








к55 L ім D BS chi 4L 


k20 ch35 4D ML M 


4LM ch50 4ML 4M 


Fig. 1. Intervals of changes in Jaccard’s index in parallel samples (“boxes with whiskers”). Designations on the abscissa axis: : k55, 
ch51, ch35, k20, ch50 — phytoplankton samples; L, D, LM, ML, M — soil samples on Silon Island; 4L, 4D, 4LM, 4ML, 4M — the same 
samples united by 4; BS — benthos samples on the shelf of Barents Sea. 

Рис. 1. Интервалы изменений индекса Жаккара B параллельных пробах («ящики с усами»). Обозначения Ha оси абсцисс: k55, 
ch51, ch35, k20, ch50 — пробы фитопланктона; L, D, LM, ML, M — почвенные пробы ga о.Силоне, 4L, 4D, 4LM, 4ML, 4M — те же 
пробы, объединенные по 4, BS — пробы бентоса на шельфе Баренцева моря. 


were first fixed using а Lyugol solution and then con- 
centrated through sedimentation. The species composi- 
tions were determined with the aid of count chambers, 
scrutinizing five chambers from each parallel sample. 
The results of counts in each chamber were utilized as 
subsamples for calculating the similarity indices [Kol- 
tsova et al., 1971; Likhacheva et al., 1979]. In the 
Chupa Bay, White Sea, on the average each sample 
contained 10 species and 60 cells in June, these values 
in August being 19 and 520, respectively. In the Kara 
Sea, each subsample on the average comprised 12 spe- 
cies and 28 cells in August, these values in each sample 
being 30 and 252, respectively. 

Data concerning macrobenthos were kindly placed 
at our disposal by N.V. Kucheruk. Material had been 
collected from the shelf of the Barents Sea in five 
dredge samples taken at each of eight stations. 

From amongst the great variety of similarity indices 
we chose only two as examples. The similarity in spe- 
cies composition was analyzed using the above-men- 
tioned Jaccard similarity index: 

JCR = c/(a+b—c), where a is the number of species 
in list A, b is that in list B, whereas ñ is the number of 
species shared by both lists. 

To evaluate the similarity in species structure, Sho- 
rygin's coefficient was applied, a likewise popular sim- 
ilarity index: 

SHR = X min(p,, р»), where min(p,, p,,) is the 
lower of two relative abundances of i-species in the 
compared, D n/N, , if n, is the abundance of 1- 
species in sample J, whereas N, =} n; This index is 
easy to calculate using any statistics software package 


containing cluster analysis which, among other things, 
accounts for the so-called Manhattan distance or City 
Block Metric: CBM = X |p, — p,,|. 

SHR = I-CBM/2 [Pesenko, 1982]. 


Results and discussion 


We created matrices for the Jaccard and Shorygin 
indices using each of the above datasets. To test the 
similarity in relation to sample size, we developed 
matrices summing up every four soil samples taken in 
each of the five quadrat plots on Silon Island. This 
operation is obviously analogue to the summation of 
phytoplankton cell counts in five chambers (subsam- 
ples) taken from each water sample. The intervals of 
the values obtained for both similarity indices are pre- 
sented in Figs 1 and 2. The abscissa axis reflects the 
increasingly growing values of the mean abundance of 
species per sample. 

The patterns of variation in the similarity indices, 
phytoplankton samples differ little from soil springtail 
ones. No peculiarities are observed in the summed 
similarity indices for macrobenthos as well. Because 
the sampling methods for phytoplankton, soil microar- 
thropods and macrobenthos, and their subsequent cam- 
eral treatment differ considerably in techniques, one 
can conclude that these differences virtually fail to 
influence statistical variation estimates. 

No relationship between the mean value and sam- 
ple size is revealed for Shorygin's index. This is relat- 
ed to the latter's low sensitivity to abundance varia- 
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Fig. 2. Intervals of changes in Shorygin's index in parallel samples. Designations as in Fig. 1. 
Рис. 2. Интервалы изменений индекса Шорыгина в параллельных пробах. Обозначения, как Ha рис. 1. 


tions in scanty species. The range of variation in values 
for this index 1s even significantly less for summed 
samples obtained through uniting soil samples by fours 
(AL, 4D, 4LM, 4ML, 4M) or phytoplankton samples 
by five subsamples (k20, ch50). In contrast, the Jac- 
card index tends to increase the mean values and to 
decrease the variation range (the difference between 
the minimum and the maximum values of the index) 
along with a growing average abundance calculated 
per sample Clearly this is related to an increased reli- 
ability of species identifications along with a growing 
sample size to be analyzed, because the dispersion of 
the index’s values in this case is only linked to identifi- 
cation errors. 

Let us remind that errors as understood here in- 
clude not only the purely technical ones related taking 
samples, fixing material etc., but also the differences 
between parallel samples related to an uneven distribu- 
tion of individuals within a habitat’s space. Due to this, 
the counts of individuals growing through summing up 
several separate samples fail to result in a significant 
approximation even of the maximum values, let alone 
the mean ones, of the indices to their theoretical value, 
i.e. 1. When evaluating similarities, uneven distribu- 
tions of organisms (the formation of groups) provide a 
more significant contribution to sample errors than do 
the purely technical errors related to counts per sam- 
ple. 

Therefore, an empirical function of distribution de- 
rived from data obtained with the use of a sufficiently 
large series of parallel samples can serve as a similarity 
standard regardless of the similarity index chosen. We 
believe that the notion “sufficiently large" must not 
cause serious doubts. As a matter of fact, it is nothing 
more (but also nothing less) than an expert judgment. 


The distribution diagrams presented above for the Jac- 
card and Shorygin indices (Figs | and 2) are based on 
similarity matrices calculated for 10 series containing 
20—50 samples each So each matrix had from 200 to 
1,200 values of an index. Summing up the frequencies 
of occurrence of these values in all study matrices of 
the individual samples’ similarity (samples k55, L, LM, 
D, ch51, BS, ch35, ML), we obtain the sum total of 
2,800 values each for the Jaccard and Shorygin indi- 
ces. Similarly, as regards the samples obtained through 
combining four neighbouring samples for Collembola 
or five subsamples for phytoplankton, we get 1,100 
values of each index. In each individual sample, the 
geometric mean of each species’ numbers did not ex- 
ceed 5, with 3 to 15 species involved. In the combined 
samples, the geometric mean of each species’ abun- 
dance ranged from 6 to 12 while the species richness in 
some samples amounted to 20, never being lower than 9. 

To apply each of the indices as similarity tests, it is 
enough to know only the “tails” of the respective em- 
pirical function of distribution. First the frequency of 
occurrence of the minimum values must be estimated, 
because H0: JCR=0 or H0: SHR=0 is logical to accept 
as a O-hypothesis. Let us exemplify this rather strong 
statement. 

To verify the reliability of the differences in mea- 
surement results, a 0-hypothesis 15 known to be formu- 
lated and tested (with a defined confidence probability 
of type 1 error) concerning the absence of differences, 
i.e. the difference being equal to 0. Most often this is a 
difference of arithmetic means found in two indepen- 
dent series of measurements (samples). The differenc- 
es revealed are evidence that the means found evaluate 
the mathematical expectations for two different general 
totalities. At a 596 significance level chosen, the proba- 
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bility of erring in such an assumption must not exceed 
0.05%. 

For our goals, a hypothesis of the presence of dif- 
ferences between samples is of no practical interest. It 
is too naive to expect that, having taken two samples 
even in the same habitat, after counts we would get two 
lists, in which the species (species composition) and 
their relative numbers would be absolutely the same. 
The following question is by far more important for 
solving problems of community classification: is it jus- 
tified to distinguish an association of organisms ob- 
served at a given moment (a taxocene of springtails in 
soil samples, a group of benthic organisms on a homo- 
geneous bottom plot, an assemblage of planktonic al- 
gae or invertebrates within a hydrologically homoge- 
neous water mass etc.) as a community or at least as a 
part of one and the same biocoenosis? To arrive at a 
conclusion, we need to check a spatio-temporal stabili- 
ty of the species composition and species structure of a 
group of organisms under study. This means that, hav- 
ing taken samples in similar habitats during the same 
season, we must test if all the differences observed are 
only related to sampling errors and to heterogeneous 
distributions of objects. 

Therefore, a hypothesis of the absence of similarity 
must be tested, 1.e. of .the similarity measure chosen 
being equal to 0. Then surpassing a certain threshold 
found based on a standard sum total for the respective 
similarity index would indicate that all differences in 
the samples compared (even those taken from various 
communities and ecosystems) fail to exceed the differ- 
ences in parallel samples, i.e. related only to aggregat- 
ed distributions of the species revealed and to technical 
errors of sampling and analysis. If this standard thresh- 
old is not overridden, we still remain in the same un- 
certain situation as in classical tests for the so-called 
difference reliability, yet with the opposite sign. If the 
sample estimate of a similarity index does not exceed a 
critical value, it cannot be considered as a good reason 
for saying that the samples to be compared were taken 
in different ecosystems or different habitats. 

However, one must keep in mind that n(n—1)/2 
indices in the similarity matrix defined for n samples 
cannot be regarded as independent realizations, be- 
cause they are correlated. This correlation is easy to 
exem|[plify as follows. If in a study total there are two 
samples completely equal in species composition, then 
their similarity to the remaining n—2 samples would be 
represented by two equal sets of values. Therefore, if a 
similarity matrix contains at least one index value equal 
to 1, then the other n—2 values would be found in the 
matrix at least twice. Due to the same reason, the 
appearance of only a single sample differing anoma- 
lously in species composition from the remaining sam- 
ples results in n anomalously small values of Shory- 
gin's index. Yet this cannot strongly affect the empiri- 
cal function of distribution through using in its analy- 
sis, as it is usually done, relative frequencies of occur- 
rence of each value of the index. It is another matter 








Table 1. Fractiles of empirical distributions for Jaccard's 
and Shorygin's indices. 

Таблица 1. Квантили эмпирических распределений для 
индексов Жаккара и Шорыгина. 


Shorygin's index 


sum for sum for 

samples samples 
combined combined 
by 4 and 5 by 4 and 5 


0.30 0.35 
0.31 0.38 
0.38 0.46 
0.42 0.52 
0.45 0.58 
0.50 0.64 


Fractile 


single 
samples 


single 
samples 





0.09 
0.11 
0.14 
0.17 
0.20 
0.23 


0.17 
0.21 
0.36 
0.43 
0.49 
0.55 





that we consider it difficult to mathematically strictly 
evaluate confidence probabilities, based on the relative 
frequencies revealed this way. Therefore, the threshold 
values of the similarity indices obtained below are only 
to be considered as expert judgments applicable to 
preliminary studies, an “exploration data analysis" as 
termed by Tukey [1981]. 

Table 1 shows fractiles for the distribution func- 
tions found, which correspond to the significance lev- 
els the experimentalists are used to ("thresholds of 
faultless forecasts", in terms of Plokhinskiy [1970]). 

Consequently, if we have a series of samples in 
which the geometric mean of a species’ numbers does 
not exceed 3 specimens (or else, this being nearly the 
same, the total counts do not exceed 200 individuals, 
representing not more than 10—15 species per sample), 
the similarity matrix may contain 5% of values of Jac- 
card's index less than 0.20 and 196 of JCR«0.14, even 
though all these samples were taken in the same place 
and at the same time. Table 1 refers to such samples as 
single (entomologists simply term them as samples, 
planktonologists as subsamples). Because in practical 
ecological studies the number of samples to compare 
amounts to dozens, in the corresponding matrices the 
number of values of the similarity index can reach 
several hundred. Then 196 of the total number of val- 
ues does not look as "sufficiently low" as it does when 
testing ordinary statistical hypotheses. 

When exploring the similarity in species composi- 
tion using Jaccard's index, there is hardly any sense to 
utilize single samples (in the above sense). Instead, 
sample sizes must be selected so that the total number 
of counted individuals would considerably exceed 200, 
the number of species per sample not less than 10 while 
the geometric mean of abundance not less than 8-9 
specimens. Another approach is also possible: because 
the above-mentioned quantitative characteristics be- 
come available only upon a cameral treatment of the 
samples, results of the summation of several single 
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Table 2. Similarity of species structure using Shorygin's index (SHR) for samples from forest ecosystems. 
Таблица 2. Сходство видовой структуры по индексу Шорыгина (SHR) для проб из лесных экосистем. 


















































































































































%% | Al | A2 | A3 A4 | В! | B2 | B3 | B4 | Cl | C2 | C3 | D1 | D2 | D3 | El | E2 | E3 | E4 FI | F2 | F3 | F4 
А! 70 50 69 | 28 37 27 33 46 48 51 40 34 48 30 31 33 31 33 33 33 33 
A2 | 70 54 62 | 19 28 19 24 42 43 47 42 36 56 26 34 37 37 30 40 39 36 
АЗ | 50 54 45 |19 27 18 23 31 34 35 34 30 39 21 21 21 20 20 23 23 20 
A4 | 69 62 45 18 26 17 22 46 50 47 41 43 49 24 24 24 20 21 23 23 2 
Bl | 28 19 19 18 81 87 85 | 25 20 25 14 #14 15 з 10 ШИ 16 17 17 21 
B2 | 37 28 27 26 8 78 84 | 30 26 32 20 19 21 18 16 16 16 21 22 22 26 
ВЗ | 27 19 18 17 | 87 78 86 | 27 23 29 18 18 18 15 12 14 13 18 19 17 22 
B4 | 33 24 23 22 | 85 84 86 30 25 31 21 21 22 14 11 12 12 17 18 17 21 
Cl | 46 42 31 46 25 30 27 30 82 80 | 39 47 50 30 33 34 32 39 36 33 36 
C2 | 48 43 34 50 20 26 23 25 | 82 75 | 43 49 51 31 33 36 31 29 34 33 31 
C3 | 51 47 35 47 25 32 29 31| 80 75 42 43 55 28 29 31 32 33 34 33 35 
DI | 40 42 34 41 14 20 18 21 39 43 42 75 75 | 34 32 32 32 35 32 34 33 
D2 | за 236 30 43 14 19 18 21 47 49 43 | 75 70,23 22 23 20 21 23 23 21 
D3 | 48 56 39 49 15 21 18 22 50 51 55 | 75 70 36 39 144 144 36 44 46 42 
El 30 26 21 24 13 18 15 14 30 31 28 34 23 36 77 76 55 | 73 63 65 64 
E2 | зі 34 21 24 10 16 12 11 33 33 29 32 22 397 78 59 | 69 60 60 55 
ЕЗ | 33 37 21 24 И 16 14 12 34 36 31 32 23 4|% 78 63|72 67 69 65 
E4 | 31 37 20 20 И 16 13 12 32 31 32 32 20 44 |55 59 63 57 64 75 61 
Fl 33 30 20 21 16 21 18 17 39 29 33 35 21 36 713 69 72 57 73 74 71 
F2 33 40 23 23 17 22 19 18 36 34 34 32 23 44 63 60 67 64 73 82 78 
F3 33 39 23 23 17 22 17 17 33 33 33 34 23 46 65 60 69 75174 82 76 
F4 | 33 36 20 21 21 26 22 21 36 31 35 33 21 42 64 55 65 61|71 78 76 























samples taken simultaneously close to one another are 
to be considered as initial data for calculating the simi- 
larity indices. In our case, this corresponds to summing 
up 4 or 5 single samples and generally agrees with the 
standards practiced by entomologists and planktonolo- 
gists. 

The introduction of the notion "similarity standard" 
allows for a number of problems to be solved which 
arise when comparing the species composition and the 
species structure of communities with the use of such 
traditional methods of multivariate analysis as cluster 
analysis, multidimensional scaling etc. First of all, these 
imply uncertainty in choosing a measure of similarity 
for samples taken in natural ecosystems, as well as 
difficulties arising from analyses of similarity matrices. 
Various methods of analysis of similarity matrices, 
most of which are non-stochastic in the true sense of 
the word, often lead to fundamentally different results, 
even when the same similarity index is applied. 

Let us use the similarity matrix calculated with the 
aid of Shorygin's index (Tab. 2) for the numbers of 
springtails in five forest ecosystems obtained through 
combining every five samples taken in “lines” between 
two trees. A and B are so summarized samples from a 
bilberry spruce forest and an oak woodland in the 
Mordovian Nature Reserve, respectively. C and D are 
similar sums for an oak and a broadleaved-coniferous 
forest in the environs of Vilnius, respectively. E and F 


represent “true” repetitions in the fullest sense of the 
word, as a result of sampling in the same bilberry 
spruce forest at Ramenye in 1980 (E) and 1981 (F). 

In Tab. 1, let us find a value of SHR=45% which 
roughly corresponds to the 1% fractile. If we accept it 
as the minimum standard value and then mark boldface 
in Tab. 2 all values of SHR>45%, it becomes apparent 
that all samples taken in each of the ecosystems cluster 
into groups clearly isolated from one another. This 
primarily shows that, within each of the forests, the 
similarity in springtail species composition in samples 
corresponds well to the similarity standard. In addition, 
it is obvious that species the composition in all samples 
taken in 1980 from the bilberry spruce forest at Ra- 
menye is similar to that in 1981 samples. When viewed 
from a different aspect, by species structure the Ra- 
menye collembolan population sampled in 1980 and 
1981 is as similar as the samples taken on the same day 
on Silon Island on a plot not exceeding 1⁄4 sq. m. Let us 
remind that it is the sum total of SHR values that, 
together with phytoplankton and macrobenthos sam- 
ples, we have accepted as a similarity standard. 

The conclusion seems to be quite sound that, during 
the year that passed between the two sampling repeti- 
tions at Ramenye, the species composition of spring- 
tails failed to alter significantly, although we are un- 
able to provide a strict statistical evaluation of this 
conclusion's reliability. It is noteworthy, however, that 
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Fig. 3. Dendrogram obtained by using the method of weighted between-group mean, based on data in Table 2. 
Рис. 3. Дендрограмма, построенная методом взвешенной межгрупповой средней по данным табл. 2. 


the traditional methods of multivariate analysis are not 
capable of giving such an evaluation either. 

Along with the above, Tab. 2 also shows some 
samples from different forests, in which the values of 
Shorygin's index exceed, albeit not more than by 10%, 
the lowest threshold of SHR=45% we have accepted as 
a similarity standard. Interestingly, samples D2 and D3 
from a broadleaved-coniferous stand in Lithuania ap- 
pear to be similar both to samples C1—C3 from a Lithua- 
nian woodland and to samples Al, A2 and A4 from a 
Mordovian bilberry spruce forest. Less surprisingly, but 
importantly enough, the similarity between samples from 
both forests from near Vilnius is considerably higher 
than with samples from the other woodlands. 

It seems useful to compare the above conclusions 
which are based of the similarity matrix in Tab. 2 with 
what could be obtained using traditional methods of 
analysis. Fig. 3 shows a dendrogram derived from a 
matrix of Manhattan distances (CBM) calculated using 
the same dataset on springtails from five woodlands 
which forms Tab. 2. 

The method of weighted between-group mean has 
been chosen from the usual set of connecting methods 
(nearest-neighbour analysis, farthest-neighbour analy- 
sis, Word's test etc.), following an advice of A.T. 


Terekhin — it is applying cluster analysis to ecological 
problems that he is most experienced in. As usual in 
any analysis of dendrograms, the most difficult part is 
choosing a CBM "threshold" value to separate one 
cluster from another. If one sticks to a value of SHR = 
45% (i.e. CBM = 1.10) which we proposed earlier, a 
clear-cut differentiation into five clusters can be seen 
like in Tab. 2, each cluster corresponding to one of the 
study forests. But there is no CBM value at which Fig. 
3 would show that some samples from Lithuanian wood- 
lands are similar in species structure to those from 
coniferous forests of European Russia. 

Along with the development and increasing distri- 
bution of personal computers, together with their sta- 
tistical software, multidimensional scaling techniques 
have gained, absolutely unfairly in our opinion, much 
popularity. By the number of offered methods and 
algorithms these techniques steadily catch up with clus- 
ter analysis. Based on our own, however limited, but 
mostly negative experience in using these methods, we 
shall restrict ourselves to a single example. 

Fig. 4 depicts a so-called MDS diagram which is 
based on the same dataset from Tab. 2. 

It seems enough to compare the distances between 
dots in this diagram with the initial SHR values in Tab. 
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Fig. 4. MDS diagram for dataset in Table 2. 
Рис. 4. MDS-quarpamMa для данных табл. 2. 


2 to become convinced that the algorithm applied ex- 
aggerates both the differences and similarities. This 
could also be seen in the so-called Sheppard's diagram 
which we omit as redundant. Thus, cluster E (Ramenye) 
looks more compact than cluster B (Mordovian oak 
wood). In the meanwhile, the mean SEM distance in 
group B amounts to 0.33, whereas in group E to 0.64., 
i.e. nearly twice as much. If dot designations are to be 
removed from Fig. 2, it would never be possible to 
recognize the Al—A4 cluster (Mordovian spruce stand) 
as being separate from the C1-C3 and 01-03 dots. 
Therefore, based on our deliberately simple example, 
one can notice that both cluster analysis and multidi- 
mensional scaling can result either in a loss of useful 
information or in misleading statistics, or both. 
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